This Rmarkdown script (and corresponding the TAB-separated CSV input data files InfoRateData.csv and AutomaticSylDetect.csv, and the resulting HTML document) contain the full analysis and plotting code accompanying the paper Different languages, similar encoding efficiency: comparable information rates across the human communicative niche.
For more information on the data, please see Oh (2015). There are in total 17 languages (see the Table below).
The oral corpus is based on a subset of the Multext (Multilingual Text Tools and Corpora) parallel corpus (Campione & Véronis, 1998) in British English, German, and Italian. The material consists of 15 short texts of 3-5 semantically connected sentences carefully translated by a native speaker in each language.
For the other 14 languages, two of the authors supervised the translation and recording of new datasets. All participants were native speakers of the target language, with a focus on a specific variety of the language when possible – e.g. Mandarin spoken in Beijing, Serbian in Belgrade and Korean in Seoul. No strict control on age or on the speakers’ social diversity was performed, but speakers were mainly students or members of academic institutions. Speakers were asked to read three times (first silently and then loudly twice) each text. The texts were presented one by one on the screen in random order, in a self-paced reading paradigm. This way, speakers familiarized themselves with the text and reduce their reading errors. The second loud recording was analyzed in this study.
With these, we the oral corpus contains 2288 texts read by 170 coming from 17 languages and 15 texts.
For each language and text we have the following number of syllables (NS):
| Language | Text | NS |
|---|---|---|
| CAT | O1 | 88 |
| CAT | O2 | 118 |
| CAT | O3 | 139 |
| CAT | O4 | 142 |
| CAT | O6 | 99 |
| CAT | O8 | 131 |
| CAT | O9 | 81 |
| CAT | P0 | 122 |
| CAT | P1 | 127 |
| CAT | P2 | 118 |
| CAT | P3 | 131 |
| CAT | P8 | 117 |
| CAT | P9 | 119 |
| CAT | Q0 | 127 |
| CAT | Q1 | 93 |
| CMN | O1 | 60 |
| CMN | O2 | 73 |
| CMN | O3 | 73 |
| CMN | O4 | 80 |
| CMN | O6 | 64 |
| CMN | O8 | 69 |
| CMN | O9 | 57 |
| CMN | P0 | 100 |
| CMN | P1 | 70 |
| CMN | P2 | 65 |
| CMN | P3 | 77 |
| CMN | P8 | 78 |
| CMN | P9 | 76 |
| CMN | Q0 | 68 |
| CMN | Q1 | 57 |
| DEU | O1 | 86 |
| DEU | O4 | 87 |
| DEU | O6 | 82 |
| DEU | O9 | 60 |
| DEU | P0 | 111 |
| DEU | Q0 | 113 |
| DEU | O2 | 117 |
| DEU | O3 | 86 |
| DEU | O8 | 115 |
| DEU | P1 | 105 |
| DEU | P2 | 90 |
| DEU | P3 | 106 |
| DEU | P8 | 87 |
| DEU | P9 | 92 |
| DEU | Q1 | 94 |
| ENG | O1 | 70 |
| ENG | O2 | 86 |
| ENG | O3 | 85 |
| ENG | O4 | 84 |
| ENG | O6 | 67 |
| ENG | O8 | 84 |
| ENG | O9 | 67 |
| ENG | P1 | 105 |
| ENG | P2 | 76 |
| ENG | P3 | 91 |
| ENG | Q0 | 88 |
| ENG | Q1 | 62 |
| ENG | P0 | 92 |
| ENG | P8 | 77 |
| ENG | P9 | 91 |
| EUS | O1 | 102 |
| EUS | O2 | 106 |
| EUS | O3 | 108 |
| EUS | O4 | 135 |
| EUS | O6 | 107 |
| EUS | O8 | 138 |
| EUS | O9 | 67 |
| EUS | P0 | 142 |
| EUS | P1 | 117 |
| EUS | P2 | 109 |
| EUS | P3 | 119 |
| EUS | P8 | 121 |
| EUS | P9 | 106 |
| EUS | Q0 | 121 |
| EUS | Q1 | 87 |
| FIN | O1 | 91 |
| FIN | O2 | 110 |
| FIN | O3 | 110 |
| FIN | O4 | 123 |
| FIN | O6 | 84 |
| FIN | O8 | 111 |
| FIN | O9 | 74 |
| FIN | P0 | 117 |
| FIN | P1 | 123 |
| FIN | P2 | 98 |
| FIN | P3 | 119 |
| FIN | P8 | 96 |
| FIN | P9 | 108 |
| FIN | Q0 | 113 |
| FIN | Q1 | 83 |
| FRA | O1 | 87 |
| FRA | O2 | 106 |
| FRA | O3 | 95 |
| FRA | O4 | 93 |
| FRA | O6 | 77 |
| FRA | O8 | 94 |
| FRA | O9 | 65 |
| FRA | P0 | 100 |
| FRA | P1 | 104 |
| FRA | P2 | 88 |
| FRA | P3 | 107 |
| FRA | P8 | 95 |
| FRA | P9 | 92 |
| FRA | Q0 | 99 |
| FRA | Q1 | 68 |
| HUN | O1 | 89 |
| HUN | O2 | 112 |
| HUN | O3 | 100 |
| HUN | O4 | 124 |
| HUN | O6 | 72 |
| HUN | O8 | 99 |
| HUN | O9 | 82 |
| HUN | P0 | 122 |
| HUN | P1 | 112 |
| HUN | P2 | 105 |
| HUN | P3 | 113 |
| HUN | P8 | 101 |
| HUN | P9 | 117 |
| HUN | Q0 | 103 |
| HUN | Q1 | 98 |
| ITA | O4 | 83 |
| ITA | O6 | 86 |
| ITA | O8 | 109 |
| ITA | O9 | 68 |
| ITA | P0 | 123 |
| ITA | P8 | 110 |
| ITA | P9 | 100 |
| ITA | Q1 | 106 |
| ITA | O1 | 89 |
| ITA | O2 | 113 |
| ITA | O3 | 100 |
| ITA | P1 | 109 |
| ITA | P2 | 117 |
| ITA | P3 | 111 |
| ITA | Q0 | 110 |
| JPN | O1 | 119 |
| JPN | O2 | 162 |
| JPN | O3 | 153 |
| JPN | O4 | 154 |
| JPN | O6 | 129 |
| JPN | O8 | 159 |
| JPN | O9 | 83 |
| JPN | P0 | 156 |
| JPN | P1 | 131 |
| JPN | P2 | 142 |
| JPN | P3 | 152 |
| JPN | P8 | 117 |
| JPN | P9 | 139 |
| JPN | Q0 | 150 |
| JPN | Q1 | 126 |
| KOR | O1 | 86 |
| KOR | O2 | 105 |
| KOR | O3 | 116 |
| KOR | O4 | 136 |
| KOR | O6 | 107 |
| KOR | O8 | 132 |
| KOR | O9 | 86 |
| KOR | P0 | 133 |
| KOR | P1 | 124 |
| KOR | P2 | 128 |
| KOR | P3 | 117 |
| KOR | P8 | 115 |
| KOR | P9 | 127 |
| KOR | Q0 | 112 |
| KOR | Q1 | 115 |
| SPA | O1 | 94 |
| SPA | O2 | 135 |
| SPA | O3 | 111 |
| SPA | O4 | 152 |
| SPA | O6 | 103 |
| SPA | O8 | 136 |
| SPA | O9 | 79 |
| SPA | P0 | 153 |
| SPA | P1 | 137 |
| SPA | P2 | 120 |
| SPA | P3 | 119 |
| SPA | P8 | 93 |
| SPA | P9 | 119 |
| SPA | Q0 | 126 |
| SPA | Q1 | 81 |
| SRP | O1 | 87 |
| SRP | O2 | 99 |
| SRP | O3 | 120 |
| SRP | O4 | 128 |
| SRP | O6 | 98 |
| SRP | O8 | 110 |
| SRP | O9 | 75 |
| SRP | P0 | 137 |
| SRP | P1 | 121 |
| SRP | P2 | 98 |
| SRP | P3 | 129 |
| SRP | P8 | 110 |
| SRP | P9 | 111 |
| SRP | Q0 | 102 |
| SRP | Q1 | 89 |
| THA | O1 | 64 |
| THA | O2 | 74 |
| THA | O3 | 85 |
| THA | O4 | 110 |
| THA | O6 | 78 |
| THA | O8 | 93 |
| THA | O9 | 60 |
| THA | P0 | 103 |
| THA | P1 | 96 |
| THA | P2 | 79 |
| THA | P3 | 95 |
| THA | P8 | 81 |
| THA | P9 | 77 |
| THA | Q0 | 73 |
| THA | Q1 | 56 |
| TUR | O1 | 108 |
| TUR | O2 | 139 |
| TUR | O3 | 120 |
| TUR | O4 | 143 |
| TUR | O6 | 99 |
| TUR | O8 | 130 |
| TUR | O9 | 79 |
| TUR | P0 | 102 |
| TUR | P1 | 102 |
| TUR | P2 | 116 |
| TUR | P3 | 157 |
| TUR | P8 | 94 |
| TUR | P9 | 128 |
| TUR | Q0 | 107 |
| TUR | Q1 | 88 |
| VIE | O1 | 49 |
| VIE | O2 | 93 |
| VIE | O3 | 55 |
| VIE | O4 | 89 |
| VIE | O6 | 80 |
| VIE | O8 | 81 |
| VIE | O9 | 52 |
| VIE | P0 | 102 |
| VIE | P1 | 77 |
| VIE | P2 | 81 |
| VIE | P3 | 90 |
| VIE | P8 | 75 |
| VIE | P9 | 61 |
| VIE | Q0 | 56 |
| VIE | Q1 | 55 |
| YUE | O1 | 59 |
| YUE | O2 | 80 |
| YUE | O3 | 96 |
| YUE | O4 | 69 |
| YUE | O6 | 80 |
| YUE | O8 | 92 |
| YUE | O9 | 56 |
| YUE | P0 | 102 |
| YUE | P1 | 88 |
| YUE | P2 | 78 |
| YUE | P3 | 83 |
| YUE | P8 | 77 |
| YUE | P9 | 90 |
| YUE | Q0 | 88 |
| YUE | Q1 | 73 |
Text datasets were acquired from various sources as illustrated in the Table below. After an initial data curation, each dataset was phonetically transcribed and automatically syllabified by a rule-based program written by one of the authors, except in the following cases:
Additionally, no syllabification was required for Sino-Tibetan languages (Cantonese and Mandarin Chinese) since one ideogram corresponds to one syllable.
| Language | Family | ISO 639-3 | Corpus |
|---|---|---|---|
| Basque | Basque | EUS | E-Hitz (Perea et al., 2006) |
| British English | Indo-European | ENG | WebCelex (MPI for Psycholinguistics) |
| Cantonese | Sino-Tibetan | YUE | A linguistic corpus of mid-20th century Hong Kong Cantonese |
| Catalan | Indo-European | CAT | Frequency dictionary (Zséder et al., 2012) |
| Finnish | Uralic | FIN | Finnish Parole Corpus |
| French | Indo-European | FRA | Lexique 3.80 (New et al., 2001) |
| German | Indo-European | DEU | WebCelex (MPI for Psycholinguistics) |
| Hungarian | Uralic | HUN | Hungarian National Corpus (Váradi, 2002) |
| Italian | Indo-European | ITA | The Corpus PAISÀ (Lyding et al., 2014) |
| Japanese | Japanese | JPN | Japanese Internet Corpus (Sharoff, 2006) |
| Korean | Korean | KOR | Leipzig Corpora Collection (LCC) |
| Mandarin Chinese | Sino-Tibetan | CMN | Chinese Internet Corpus (Sharoff, 2006) |
| Serbian | Indo-European | SRP | Frequency dictionary (Zséder et al., 2012) |
| Spanish | Indo-European | SPA | Frequency dictionary (Zséder et al., 2012) |
| Thai | Tai-Kadai | THA | Thai National Corpus (TNC) |
| Turkish | Turkic | TUR | Leipzig Corpora Collection (LCC) |
| Vietnamese | Austroasiatic | VIE | VNSpeechCorpus (Le et al., 2004) |
The data is structured as follows:
We use throughout sum contrasts for the factor IVs, which are orthogonal contrasts which compare every level of the IV to the overall mean (for example, for a two-levels factor such as Sex we do not compare Males with Females but each with their overall mean, which is included in the intercept). However, in R the contr.sum() function used to define this contrasts produces level names that are very uninformative, so we explicit these below (please note that in the model outputs the last level is usually not shown):
Sex1 = F, Sex2 = M (the last level, Sex2 is usually not displayed);Text1 = O1, Text2 = O2, Text3 = O3, Text4 = O4, Text5 = O6, Text6 = O8, Text7 = O9, Text8 = P0, Text9 = P1, Text10 = P2, Text11 = P3, Text12 = P8, Text13 = P9, Text14 = Q0, Text15 = Q1 (the last level, Text15 is usually not displayed);Language1 = CAT, Language2 = CMN, Language3 = DEU, Language4 = ENG, Language5 = EUS, Language6 = FIN, Language7 = FRA, Language8 = HUN, Language9 = ITA, Language10 = JPN, Language11 = KOR, Language12 = SPA, Language13 = SRP, Language14 = THA, Language15 = TUR, Language16 = VIE, Language17 = YUE (the last level, Language17 is usually not displayed);Family1 = Austroasiatic, Family2 = Basque, Family3 = Indo-European, Family4 = Japanese, Family5 = Korean, Family6 = Sino-Tibetan, Family7 = Tai-Kadai, Family8 = Turkic, Family9 = Uralic (the last level, Family9 is usually not displayed);| Lng | # spkrs | % fem | # age | mean(age) | sd(age) | actual ages |
|---|---|---|---|---|---|---|
| CAT | 10 | 50 | 10 | 35.4 | 9.2 | (21, 28, 28, 29, 31, 39, 42, 42, 44, 50) |
| CMN | 10 | 50 | 9 | 23.1 | 4.5 | (19, 19, 19, 19, 24, 24, 25, 28, 31) |
| DEU | 10 | 50 | 0 | NaN | NaN | () |
| ENG | 10 | 50 | 0 | NaN | NaN | () |
| EUS | 10 | 50 | 10 | 28.0 | 4.9 | (19, 22, 26, 27, 28, 29, 30, 31, 32, 36) |
| FIN | 10 | 50 | 10 | 33.2 | 11.0 | (16, 22, 26, 28, 30, 35, 37, 41, 45, 52) |
| FRA | 10 | 50 | 10 | 32.5 | 7.7 | (24, 25, 25, 27, 28, 36, 36, 37, 41, 46) |
| HUN | 10 | 50 | 10 | 39.3 | 15.8 | (17, 27, 27, 31, 33, 39, 42, 51, 57, 69) |
| ITA | 10 | 50 | 0 | NaN | NaN | () |
| JPN | 10 | 50 | 10 | 30.6 | 12.8 | (20, 20, 21, 22, 22, 28, 29, 40, 51, 53) |
| KOR | 10 | 50 | 10 | 28.6 | 10.6 | (16, 19, 19, 19, 28, 31, 33, 35, 36, 50) |
| SPA | 10 | 50 | 10 | 33.7 | 10.1 | (21, 22, 26, 28, 30, 32, 42, 42, 44, 50) |
| SRP | 10 | 50 | 10 | 30.6 | 7.8 | (19, 21, 23, 30, 31, 32, 34, 34, 38, 44) |
| THA | 10 | 50 | 10 | 30.1 | 5.7 | (23, 23, 27, 28, 30, 31, 31, 32, 33, 43) |
| TUR | 10 | 50 | 7 | 32.6 | 7.2 | (24, 25, 30, 31, 37, 37, 44) |
| VIE | 10 | 50 | 6 | 27.2 | 4.1 | (21, 25, 26, 28, 31, 32) |
| YUE | 10 | 50 | 10 | 22.0 | 1.5 | (20, 20, 21, 21, 22, 22, 23, 23, 24, 24) |
NS: exploratory plots.
mean=101.196, median=100, sd=24.703, CV=0.244, min=49, max=162, kurtosis=2.484, skewness=0.177.
SR: exploratory plots.
SR per speaker.
SR by Sex and Age across Languages.
SR by Sex, Age and Language.
SR by language.
mean=6.631, median=6.777, sd=1.148, CV=0.173, min=3.589, max=9.492, kurtosis=2.408, skewness=-0.168.
ShE and ID: exploratory plots.
ShE vs ID.
Pearson's product-moment correlation
data: tmp1$ShE and tmp1$ID
t = 2.0326, df = 15, p-value = 0.06019
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.02052914 0.77274779
sample estimates:
cor
0.4647009
Spearman's rank correlation rho
data: tmp1$ShE and tmp1$ID
S = 451.88, p-value = 0.07259
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.4462208
Paired t-test
data: tmp1$ShE and tmp1$ID
t = 11.635, df = 16, p-value = 3.213e-09
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
2.158040 3.119607
sample estimates:
mean of the differences
2.638824
ShE:
mean=8.621, median=8.69, sd=0.904, CV=0.105, min=6.07, max=9.83, kurtosis=4.665, skewness=-1.122.
ID:
mean=6.009, median=5.56, sd=0.883, CV=0.147, min=4.83, max=8.02, kurtosis=2.53, skewness=0.747.
ShIR: exploratory plots.
ShIR per speaker.
ShIR by Sex and Age across Languages.
ShIR by Sex, Age and Language.
ShIR by language.
IR: exploratory plots.
IR per speaker.
IR by Sex and Age across Languages.
IR by Sex, Age and Language.
IR by language.
ShIR:
mean=56.709, median=57.207, sd=9.35, CV=0.165, min=32.772, max=89.235, kurtosis=2.444, skewness=0.079.
IR:
mean=39.153, median=39.13, sd=5.097, CV=0.13, min=25.631, max=60.692, kurtosis=3.622, skewness=0.325.
SR vs ID
Pearson's product-moment correlation
data: info.rate.data$SR and info.rate.data$ID
t = -45.329, df = 2286, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.7090031 -0.6658066
sample estimates:
cor
-0.6880138
Spearman's rank correlation rho
data: info.rate.data$SR and info.rate.data$ID
S = 3393600000, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
-0.6999614
| Level-1 factor (f) | ICC |
|---|---|
| Text | 0.26 |
| Family | 0.42 |
| Language | 0.19 |
| Speaker | 0.00 |
| Level-1 factor (f) | ICC |
|---|---|
| Text | 0.01 |
| Family | 0.50 |
| Language | 0.19 |
| Speaker | 0.24 |
| Level-1 factor (f) | ICC |
|---|---|
| Text | 0.01 |
| Family | 0.47 |
| Language | 0.13 |
| Speaker | 0.30 |
| Level-1 factor (f) | ICC |
|---|---|
| Text | 0.02 |
| Family | 0.03 |
| Language | 0.32 |
| Speaker | 0.49 |
| model | AIC | BIC |
|---|---|---|
| 1 + (1 | Text) + (1 | Family/Language) + (1 | Speaker) | 2127.51 | 2161.93 |
| 1 + (1 | Family/Language) + (1 | Speaker) | 2397.97 | 2426.65 |
| 1 + (1 | Text) + (1 | Speaker) | 2257.73 | 2280.68 |
| 1 + (1 | Text) + (1 | Family/Language) | 4717.06 | 4745.73 |
| 1 + Sex + (1 | Text) + (1 | Family/Language) + (1 | Speaker) | 2121.04 | 2161.19 |
| 1 + Sex + (1 | Family/Language) + (1 | Speaker) | 2391.29 | 2425.7 |
| 1 + Sex + (1 | Text) + (1 | Speaker) | 2258.63 | 2287.31 |
| 1 + Sex + (1 | Text) + (1 | Family/Language) | 4584.11 | 4618.52 |
We consider here the full model SR ~ 1 + Sex + (1|Text) + (1|Family/Language) + (1|Speaker).
| model | AIC | BIC |
|---|---|---|
| 1 + (1 | Text) + (1 | Family/Language) + (1 | Speaker) | 10257.84 | 10292.25 |
| 1 + (1 | Family/Language) + (1 | Speaker) | 10524.5 | 10553.18 |
| 1 + (1 | Text) + (1 | Speaker) | 10305.76 | 10328.71 |
| 1 + (1 | Text) + (1 | Family/Language) | 12888.49 | 12917.17 |
| 1 + Sex + (1 | Text) + (1 | Family/Language) + (1 | Speaker) | 10247.34 | 10287.49 |
| 1 + Sex + (1 | Family/Language) + (1 | Speaker) | 10513.79 | 10548.21 |
| 1 + Sex + (1 | Text) + (1 | Speaker) | 10300.05 | 10328.72 |
| 1 + Sex + (1 | Text) + (1 | Family/Language) | 12747.04 | 12781.45 |
We consider here the full model IR ~ 1 + Sex + (1|Text) + (1|Family/Language) + (1|Speaker).
We will use a Gaussian distribution (with fixed or modelled variance).
******************************************************************
Summary of the Quantile Residuals
mean = -7.131581e-05
variance = 1.000437
coef. of skewness = 0.04303609
coef. of kurtosis = 3.710291
Filliben correlation coefficient = 0.9979187
******************************************************************
Deviance= 1177.818
AIC= 1576.98
******************************************************************
Summary of the Quantile Residuals
mean = 0.002001851
variance = 1.000433
coef. of skewness = 0.03014483
coef. of kurtosis = 2.864355
Filliben correlation coefficient = 0.9994151
******************************************************************
Deviance= 815.9172
AIC= 1405.705
The distribution of the residuals is less heteroscedastic than before and the fit to the data better. The full summary of the model is:
******************************************************************
Family: c("NO", "Normal")
Call: gamlss(formula = SR ~ 1 + Sex + random(Text) + random(Language) + random(Family) + random(Speaker), sigma.formula = ~1 + Sex + random(Text) + random(Language) + random(Family) +
random(Speaker), family = NO(mu.link = "identity"), data = d, control = gamlss.control(n.cyc = 800, trace = FALSE), i.control = glim.control(bf.cyc = 800))
Fitting method: RS()
------------------------------------------------------------------
Mu link function: identity
Mu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.629552 0.005816 1139.83 <2e-16 ***
Sex1 -0.168157 0.005816 -28.91 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
Sigma link function: log
Sigma Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.23693 0.01478 -83.673 < 2e-16 ***
Sex1 -0.05788 0.01478 -3.915 9.34e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
NOTE: Additive smoothing terms exist in the formulas:
i) Std. Error for smoothers are for the linear effect only.
ii) Std. Error for the linear terms maybe are not accurate.
------------------------------------------------------------------
No. of observations in the fit: 2288
Degrees of Freedom for the fit: 294.8937
Residual Deg. of Freedom: 1993.106
at cycle: 53
Global Deviance: 815.9172
AIC: 1405.705
SBC: 3097.048
******************************************************************
Text
Random effects fit using the gamlss function random()
Degrees of Freedom for the fit : 14.39185
Random effect parameter sigma_b: 0.109103
Smoothing parameter lambda : 84.5402
Language
Random effects fit using the gamlss function random()
Degrees of Freedom for the fit : 16.98344
Random effect parameter sigma_b: 0.870621
Smoothing parameter lambda : 1.32916
Family
Random effects fit using the gamlss function random()
Degrees of Freedom for the fit : 0.01237646
Random effect parameter sigma_b: 1.91981e-05
Smoothing parameter lambda : 2713240000
Speaker
Random effects fit using the gamlss function random()
Degrees of Freedom for the fit : 165.4211
Random effect parameter sigma_b: 0.569042
Smoothing parameter lambda : 3.32891
Text
Random effects fit using the gamlss function random()
Degrees of Freedom for the fit : 0.00152528
Random effect parameter sigma_b: 0.000518285
Smoothing parameter lambda : 3475370
Language
Random effects fit using the gamlss function random()
Degrees of Freedom for the fit : 14.62296
Random effect parameter sigma_b: 0.153319
Smoothing parameter lambda : 39.9698
Family
Random effects fit using the gamlss function random()
Degrees of Freedom for the fit : 0.001713103
Random effect parameter sigma_b: 0.000184844
Smoothing parameter lambda : 27323100
Speaker
Random effects fit using the gamlss function random()
Degrees of Freedom for the fit : 79.45882
Random effect parameter sigma_b: 0.181484
Smoothing parameter lambda : 29.3639
******************************************************************
Summary of the Quantile Residuals
mean = 8.557955e-05
variance = 1.000437
coef. of skewness = 0.09721211
coef. of kurtosis = 3.684191
Filliben correlation coefficient = 0.9978017
******************************************************************
Deviance= 9322.636
AIC= 9721.861
******************************************************************
Summary of the Quantile Residuals
mean = 0.002104815
variance = 1.000442
coef. of skewness = 0.03139581
coef. of kurtosis = 2.864031
Filliben correlation coefficient = 0.9993924
******************************************************************
Deviance= 8961.553
AIC= 9554.321
Again, this is a better fit to the data. The full summary of the model is:
******************************************************************
Family: c("NO", "Normal")
Call: gamlss(formula = IR ~ 1 + Sex + random(Text) + random(Language) + random(Family) + random(Speaker), sigma.formula = ~1 + Sex + random(Text) + random(Language) + random(Family) +
random(Speaker), family = NO(mu.link = "identity"), data = d, control = gamlss.control(n.cyc = 800, trace = FALSE), i.control = glim.control(bf.cyc = 800))
Fitting method: RS()
------------------------------------------------------------------
Mu link function: identity
Mu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 39.14451 0.03463 1130.31 <2e-16 ***
Sex1 -1.01064 0.03463 -29.18 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
Sigma link function: log
Sigma Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.54554 0.01478 36.903 < 2e-16 ***
Sex1 -0.05935 0.01478 -4.015 6.17e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
NOTE: Additive smoothing terms exist in the formulas:
i) Std. Error for smoothers are for the linear effect only.
ii) Std. Error for the linear terms maybe are not accurate.
------------------------------------------------------------------
No. of observations in the fit: 2288
Degrees of Freedom for the fit: 296.3836
Residual Deg. of Freedom: 1991.616
at cycle: 6
Global Deviance: 8961.553
AIC: 9554.321
SBC: 11254.21
******************************************************************
Text
Random effects fit using the gamlss function random()
Degrees of Freedom for the fit : 14.41202
Random effect parameter sigma_b: 0.660353
Smoothing parameter lambda : 2.30777
Language
Random effects fit using the gamlss function random()
Degrees of Freedom for the fit : 16.95235
Random effect parameter sigma_b: 3.10086
Smoothing parameter lambda : 0.104779
Family
Random effects fit using the gamlss function random()
Degrees of Freedom for the fit : 0.01908253
Random effect parameter sigma_b: 0.000233421
Smoothing parameter lambda : 18354000
Speaker
Random effects fit using the gamlss function random()
Degrees of Freedom for the fit : 165.2917
Random effect parameter sigma_b: 3.39374
Smoothing parameter lambda : 0.0935855
Text
Random effects fit using the gamlss function random()
Degrees of Freedom for the fit : 0.0004688834
Random effect parameter sigma_b: 0.000282878
Smoothing parameter lambda : 11664800
Language
Random effects fit using the gamlss function random()
Degrees of Freedom for the fit : 14.73159
Random effect parameter sigma_b: 0.157607
Smoothing parameter lambda : 37.8208
Family
Random effects fit using the gamlss function random()
Degrees of Freedom for the fit : 0.001453389
Random effect parameter sigma_b: 0.00011236
Smoothing parameter lambda : 73935500
Speaker
Random effects fit using the gamlss function random()
Degrees of Freedom for the fit : 80.97495
Random effect parameter sigma_b: 0.184921
Smoothing parameter lambda : 28.2979
Let’s model SR with ID as an additional predictor (fixed effect) interacting with Sex. N.B. In this case, we must drop Language as a random effect, since each language has, by definition, only one value of ID.
******************************************************************
Family: c("NO", "Normal")
Call: gamlss(formula = SR ~ 1 + ID * Sex + random(Text) + random(Speaker) + random(Family), sigma.formula = ~1 + ID + Sex + random(Text) + random(Speaker) + random(Family),
family = NO(mu.link = "identity"), data = d, control = gamlss.control(n.cyc = 800, trace = FALSE), i.control = glim.control(bf.cyc = 800))
Fitting method: RS()
------------------------------------------------------------------
Mu link function: identity
Mu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.970094 0.037525 318.987 < 2e-16 ***
ID -0.888703 0.005992 -148.324 < 2e-16 ***
Sex1 -0.062504 0.037524 -1.666 0.09593 .
ID:Sex1 -0.017079 0.005991 -2.851 0.00441 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
Sigma link function: log
Sigma Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.73603 0.10289 -7.153 1.18e-12 ***
ID -0.08460 0.01694 -4.993 6.45e-07 ***
Sex1 -0.05723 0.01478 -3.871 0.000112 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
NOTE: Additive smoothing terms exist in the formulas:
i) Std. Error for smoothers are for the linear effect only.
ii) Std. Error for the linear terms maybe are not accurate.
------------------------------------------------------------------
No. of observations in the fit: 2288
Degrees of Freedom for the fit: 293.3745
Residual Deg. of Freedom: 1994.626
at cycle: 8
Global Deviance: 790.2025
AIC: 1376.951
SBC: 3059.581
******************************************************************
******************************************************************
Summary of the Quantile Residuals
mean = 0.001905342
variance = 1.000433
coef. of skewness = 0.01947475
coef. of kurtosis = 2.77394
Filliben correlation coefficient = 0.9993355
******************************************************************
Deviance= 790.2025
AIC= 1376.951
Adding ID as a predictor improves the fits (as judged by AIC). There is a negative estimate for ID, but significance is difficult to assess with GAMLSS model involving smoothing functions. However, also using a simple lmer model we have a significant effect of ID:
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: SR ~ 1 + ID * Sex + (1 | Text) + (1 | Speaker) + (1 | Family)
Data: info.rate.data
REML criterion at convergence: 2121.2
Scaled residuals:
Min 1Q Median 3Q Max
-3.7806 -0.6226 0.0176 0.5808 5.1707
Random effects:
Groups Name Variance Std.Dev.
Speaker (Intercept) 0.4618 0.6795
Text (Intercept) 0.0172 0.1311
Family (Intercept) 0.2443 0.4942
Residual 0.1063 0.3260
Number of obs: 2288, groups: Speaker, 170; Text, 15; Family, 9
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 10.521343 0.637420 42.244565 16.506 < 2e-16 ***
ID -0.658948 0.101261 54.414768 -6.507 2.52e-08 ***
Sex1 -0.128232 0.367323 155.446204 -0.349 0.727
ID:Sex1 -0.006959 0.060377 155.565771 -0.115 0.908
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) ID Sex1
ID -0.959
Sex1 0.002 -0.002
ID:Sex1 -0.002 0.002 -0.990
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
ID 4.4998 4.4998 1 54.415 42.3465 2.516e-08 ***
Sex 0.0130 0.0130 1 155.446 0.1219 0.7275
ID:Sex 0.0014 0.0014 1 155.566 0.0133 0.9084
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Text
Random effects fit using the gamlss function random()
Degrees of Freedom for the fit : 14.41708
Random effect parameter sigma_b: 0.11065
Smoothing parameter lambda : 82.1944
Speaker
Random effects fit using the gamlss function random()
Degrees of Freedom for the fit : 167.4872
Random effect parameter sigma_b: 0.7577
Smoothing parameter lambda : 1.8794
Family
Random effects fit using the gamlss function random()
Degrees of Freedom for the fit : 0.01763365
Random effect parameter sigma_b: 0.000168856
Smoothing parameter lambda : 35072900
Text
Random effects fit using the gamlss function random()
Degrees of Freedom for the fit : 0.03217714
Random effect parameter sigma_b: 0.00241144
Smoothing parameter lambda : 152754
Speaker
Random effects fit using the gamlss function random()
Degrees of Freedom for the fit : 104.4197
Random effect parameter sigma_b: 0.242522
Smoothing parameter lambda : 15.8243
Family
Random effects fit using the gamlss function random()
Degrees of Freedom for the fit : 0.000711841
Random effect parameter sigma_b: 0.000279392
Smoothing parameter lambda : 11379200
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: SR ~ Age * Sex + (1 | Text) + (1 | Language)
Data: info.rate.data
REML criterion at convergence: 3743.7
Scaled residuals:
Min 1Q Median 3Q Max
-3.4947 -0.6305 0.0105 0.5978 3.6970
Random effects:
Groups Name Variance Std.Dev.
Text (Intercept) 0.01299 0.1140
Language (Intercept) 1.01328 1.0066
Residual 0.36283 0.6024
Number of obs: 1979, groups: Text, 15; Language, 14
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 6.809e+00 2.753e-01 1.417e+01 24.734 4.61e-13 ***
Age -5.972e-03 1.592e-03 1.951e+03 -3.751 0.000181 ***
Sex1 -1.084e-01 4.608e-02 1.948e+03 -2.351 0.018804 *
Age:Sex1 -1.223e-03 1.440e-03 1.949e+03 -0.849 0.395965
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) Age Sex1
Age -0.176
Sex1 0.018 -0.107
Age:Sex1 -0.019 0.115 -0.956
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: IR ~ Age * Sex + (1 | Text) + (1 | Family/Language)
Data: info.rate.data
REML criterion at convergence: 10792.3
Scaled residuals:
Min 1Q Median 3Q Max
-3.2256 -0.6274 0.0068 0.5901 4.6601
Random effects:
Groups Name Variance Std.Dev.
Text (Intercept) 0.4338 0.6586
Language:Family (Intercept) 8.4982 2.9152
Family (Intercept) 2.1402 1.4629
Residual 12.9843 3.6034
Number of obs: 1979, groups: Text, 15; Language:Family, 14; Family, 9
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 3.996e+01 1.009e+00 1.049e+01 39.596 9.13e-13 ***
Age -3.985e-02 9.517e-03 1.954e+03 -4.187 2.95e-05 ***
Sex1 -6.705e-01 2.756e-01 1.950e+03 -2.433 0.0151 *
Age:Sex1 -7.286e-03 8.615e-03 1.950e+03 -0.846 0.3978
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) Age Sex1
Age -0.285
Sex1 0.028 -0.106
Age:Sex1 -0.030 0.114 -0.956
So, it seems Age and Sex are both worth including in our models (even if we have to discard quite a bit of data because of missing Age info). (In fact, the effect of Age seems more significant than that of Sex.)
In the following, we investigate if Age does matter when using GAMLSS modelling…
Bescause there is missing data fro Age, and because the GAMLSS models require no missing data, we will fit the models with Age (and its interaction with Sex) on the subset of the data that contains only those speakers with Age info. To make comparability possible, we also fit the same models but without Age on the exact same subset of the data.
******************************************************************
Summary of the Quantile Residuals
mean = -8.252732e-05
variance = 1.000506
coef. of skewness = 0.04585
coef. of kurtosis = 3.821224
Filliben correlation coefficient = 0.9973638
******************************************************************
The model including Age * Sex is:
******************************************************************
Family: c("NO", "Normal")
Call: gamlss(formula = SR ~ 1 + Sex * Age + random(Text) + random(Language) + random(Family) + random(Speaker), family = NO(mu.link = "identity"), data = info.rate.data.for.age, control = gamlss.control(n.cyc = 800, trace = FALSE),
i.control = glim.control(bf.cyc = 800))
Fitting method: RS()
------------------------------------------------------------------
Mu link function: identity
Mu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.7419779 0.0234704 287.254 < 2e-16 ***
Sex1 -0.0916878 0.0234704 -3.907 9.71e-05 ***
Age -0.0024831 0.0007319 -3.393 0.000707 ***
Sex1:Age -0.0017244 0.0007319 -2.356 0.018566 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
Sigma link function: log
Sigma Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.1605 0.0159 -73.01 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
NOTE: Additive smoothing terms exist in the formulas:
i) Std. Error for smoothers are for the linear effect only.
ii) Std. Error for the linear terms maybe are not accurate.
------------------------------------------------------------------
No. of observations in the fit: 1979
Degrees of Freedom for the fit: 161.6924
Residual Deg. of Freedom: 1817.308
at cycle: 35
Global Deviance: 1022.798
AIC: 1346.183
SBC: 2250.099
******************************************************************
The compared models are:
| Model | Deviance | AIC |
|---|---|---|
| Age * Sex | 1022.8 | 1346.2 |
| Age + Sex | 1022.8 | 1344.2 |
| Sex | 1022.8 | 1342.2 |
So, even if Age has a significant (negative) effect and interaction with Sex (positive for males), adding it does not seem to be warranted here…
******************************************************************
Summary of the Quantile Residuals
mean = 0.002353284
variance = 1.000499
coef. of skewness = 0.02299299
coef. of kurtosis = 2.90592
Filliben correlation coefficient = 0.9993823
******************************************************************
The model including Age * Sex is:
******************************************************************
Family: c("NO", "Normal")
Call: gamlss(formula = SR ~ 1 + Sex * Age + random(Text) + random(Language) + random(Family) + random(Speaker), sigma.formula = ~1 + Sex * Age + random(Text) + random(Language) + random(Family) + random(Speaker),
family = NO(mu.link = "identity"), data = info.rate.data.for.age, control = gamlss.control(n.cyc = 800, trace = FALSE), i.control = glim.control(bf.cyc = 800))
Fitting method: RS()
------------------------------------------------------------------
Mu link function: identity
Mu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.5465239 0.0203601 321.536 < 2e-16 ***
Sex1 -0.0442321 0.0203601 -2.172 0.03 *
Age 0.0038165 0.0006483 5.887 4.71e-09 ***
Sex1:Age -0.0030913 0.0006483 -4.768 2.01e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
Sigma link function: log
Sigma Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.330351 0.053970 -24.650 <2e-16 ***
Sex1 0.019846 0.053970 0.368 0.7131
Age 0.003029 0.001685 1.798 0.0724 .
Sex1:Age -0.002561 0.001685 -1.520 0.1286
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
NOTE: Additive smoothing terms exist in the formulas:
i) Std. Error for smoothers are for the linear effect only.
ii) Std. Error for the linear terms maybe are not accurate.
------------------------------------------------------------------
No. of observations in the fit: 1979
Degrees of Freedom for the fit: 243.8639
Residual Deg. of Freedom: 1735.136
at cycle: 10
Global Deviance: 721.7861
AIC: 1209.514
SBC: 2572.798
******************************************************************
The compared models are:
| Model | Deviance | AIC |
|---|---|---|
| Age * Sex | 721.8 | 1209.5 |
| Age + Sex | 721.5 | 1206.1 |
| Sex | 721.3 | 1203 |
So, even if Age has a significant (negative) effect (but no interaction with Sex), adding it does not seem to be warranted here either…
The distribution of the residuals is less heteroscedastic than before and the fit to the data better.
Thus, for SR, even if there is a hint that Age might affect it negatively (and there might also be an interaction with Sex with a positive effect for males), overall, the various fit indices do not warrant its inclusion in the GAMLSS models.
******************************************************************
Summary of the Quantile Residuals
mean = 1.384527e-05
variance = 1.000506
coef. of skewness = 0.09166289
coef. of kurtosis = 3.764844
Filliben correlation coefficient = 0.9974381
******************************************************************
The model including Age * Sex is:
******************************************************************
Family: c("NO", "Normal")
Call: gamlss(formula = IR ~ 1 + Sex * Age + random(Text) + random(Language) + random(Family) + random(Speaker), family = NO(mu.link = "identity"), data = info.rate.data.for.age, control = gamlss.control(n.cyc = 800, trace = FALSE),
i.control = glim.control(bf.cyc = 800))
Fitting method: RS()
------------------------------------------------------------------
Mu link function: identity
Mu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 40.015135 0.137507 291.003 < 2e-16 ***
Sex1 -0.175295 0.137508 -1.275 0.203
Age -0.035742 0.004288 -8.336 < 2e-16 ***
Sex1:Age -0.023187 0.004288 -5.408 7.22e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
Sigma link function: log
Sigma Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.6074 0.0159 38.21 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
NOTE: Additive smoothing terms exist in the formulas:
i) Std. Error for smoothers are for the linear effect only.
ii) Std. Error for the linear terms maybe are not accurate.
------------------------------------------------------------------
No. of observations in the fit: 1979
Degrees of Freedom for the fit: 161.827
Residual Deg. of Freedom: 1817.173
at cycle: 2
Global Deviance: 8020.303
AIC: 8343.957
SBC: 9248.626
******************************************************************
The compared models are:
| Model | Deviance | AIC |
|---|---|---|
| Age * Sex | 8020.3 | 8344 |
| Age + Sex | 8020.3 | 8342 |
| Sex | 8020.3 | 8340 |
So, even if Age has a significant (negative) effect and interaction with Sex (positive for males) – interestingly, in this case the main effect of Sex disappears –, adding it does not seem to be warranted…
******************************************************************
Summary of the Quantile Residuals
mean = 0.002387788
variance = 1.00051
coef. of skewness = 0.02686615
coef. of kurtosis = 2.9163
Filliben correlation coefficient = 0.9993496
******************************************************************
The model including Age * Sex is:
******************************************************************
Family: c("NO", "Normal")
Call: gamlss(formula = IR ~ 1 + Sex * Age + random(Text) + random(Language) + random(Family) + random(Speaker), sigma.formula = ~1 + Sex * Age + random(Text) + random(Language) + random(Family) + random(Speaker),
family = NO(mu.link = "identity"), data = info.rate.data.for.age, control = gamlss.control(n.cyc = 800, trace = FALSE), i.control = glim.control(bf.cyc = 800))
Fitting method: RS()
------------------------------------------------------------------
Mu link function: identity
Mu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 40.041563 0.118455 338.031 < 2e-16 ***
Sex1 -0.299328 0.118455 -2.527 0.0116 *
Age -0.036894 0.003708 -9.950 < 2e-16 ***
Sex1:Age -0.019096 0.003708 -5.150 2.9e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
Sigma link function: log
Sigma Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.479445 0.053990 8.880 <2e-16 ***
Sex1 0.025025 0.053990 0.464 0.643
Age 0.001854 0.001685 1.100 0.271
Sex1:Age -0.002747 0.001685 -1.630 0.103
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
NOTE: Additive smoothing terms exist in the formulas:
i) Std. Error for smoothers are for the linear effect only.
ii) Std. Error for the linear terms maybe are not accurate.
------------------------------------------------------------------
No. of observations in the fit: 1979
Degrees of Freedom for the fit: 244.3689
Residual Deg. of Freedom: 1734.631
at cycle: 10
Global Deviance: 7726.653
AIC: 8215.391
SBC: 9581.498
******************************************************************
The compared models are:
| Model | Deviance | AIC |
|---|---|---|
| Age * Sex | 7726.7 | 8215.4 |
| Age + Sex | 7726.4 | 8212.2 |
| Sex | 7725.8 | 8208.8 |
So, even if Age has a significant (negative) effect and interaction with Sex (positive for males) – interestingly, in this case the main effect of Sex disappears –, adding it does not seem to be warranted…
The distribution of the residuals is less heteroscedastic than before and the fit to the data better.
Thus, while for IR the hint that Age has a negative main effect and interacts with Sex (with a positive effect for males, containing the whole effect of Sex) is much stronger, the various fit indices do not warrant its inclusion in the GAMLSS models.
******************************************************************
Summary of the Quantile Residuals
mean = 0.001744301
variance = 1.000502
coef. of skewness = 0.008063776
coef. of kurtosis = 2.834364
Filliben correlation coefficient = 0.9993747
******************************************************************
The model including Age * Sex is:
******************************************************************
Family: c("NO", "Normal")
Call: gamlss(formula = SR ~ 1 + ID + Sex * Age + random(Text) + random(Speaker) + random(Family), sigma.formula = ~1 + ID + Sex * Age + random(Text) + random(Speaker) + random(Family), family = NO(mu.link = "identity"),
data = info.rate.data.for.age, control = gamlss.control(n.cyc = 800, trace = FALSE), i.control = glim.control(bf.cyc = 800))
Fitting method: RS()
------------------------------------------------------------------
Mu link function: identity
Mu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.6793628 0.0517065 245.218 < 2e-16 ***
ID -0.9813533 0.0071683 -136.903 < 2e-16 ***
Sex1 0.0102517 0.0204390 0.502 0.616
Age -0.0059906 0.0006663 -8.991 < 2e-16 ***
Sex1:Age -0.0050056 0.0006537 -7.658 3.12e-14 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
Sigma link function: log
Sigma Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.790248 0.133246 -5.931 3.63e-09 ***
ID -0.086078 0.019128 -4.500 7.24e-06 ***
Sex1 0.046494 0.054368 0.855 0.3926
Age 0.001961 0.001718 1.141 0.2540
Sex1:Age -0.003459 0.001698 -2.037 0.0418 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
NOTE: Additive smoothing terms exist in the formulas:
i) Std. Error for smoothers are for the linear effect only.
ii) Std. Error for the linear terms maybe are not accurate.
------------------------------------------------------------------
No. of observations in the fit: 1979
Degrees of Freedom for the fit: 236.269
Residual Deg. of Freedom: 1742.731
at cycle: 8
Global Deviance: 705.2126
AIC: 1177.751
SBC: 2498.577
******************************************************************
The compared models are:
| Model | Deviance | AIC |
|---|---|---|
| ID * Sex * Age | 705.2 | 1188.4 |
| ID + Sex * Age | 705.2 | 1177.8 |
| ID * Sex + Age | 704.2 | 1178.2 |
| ID + Sex + Age | 704.5 | 1174.5 |
| ID * Sex | 704.4 | 1174.6 |
| ID + Sex | 704.6 | 1170.9 |
Clearly, adding Age is not warranted here (as is the interaction between ID and Sex)…
As above, we also looked a the simple lmer model:
The compared models are:
| Model | AIC |
|---|---|
| ID * Sex * Age | 1811.2 |
| ID + Sex * Age | 1790.7 |
| ID * Sex + Age | 1786.3 |
| ID + Sex + Age | 1781 |
| ID * Sex | 1777.8 |
| ID + Sex | 1772.5 |
The best model is still the one not including Age:
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: SR ~ 1 + ID + Sex + (1 | Text) + (1 | Speaker) + (1 | Family)
Data: info.rate.data.for.age
REML criterion at convergence: 1758.5
Scaled residuals:
Min 1Q Median 3Q Max
-3.8062 -0.6300 0.0174 0.5882 5.2084
Random effects:
Groups Name Variance Std.Dev.
Speaker (Intercept) 0.35876 0.5990
Text (Intercept) 0.01494 0.1222
Family (Intercept) 0.26668 0.5164
Residual 0.10575 0.3252
Number of obs: 1979, groups: Speaker, 132; Text, 15; Family, 9
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 10.80777 0.77185 27.12524 14.002 6.21e-14 ***
ID -0.70707 0.12477 31.07295 -5.667 3.15e-06 ***
Sex1 -0.14683 0.05268 119.92464 -2.787 0.00618 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) ID
ID -0.971
Sex1 -0.001 0.003
Within the limits of our reduced dataset (containing only 132 speakers with Age info), we found the following:
When modelling SR and IR with GAMLSS, while there are hints that Age has, for both, overall:
it does not seem warranted to include it in these models.
When modelling the relationship between SR and ID, this negative relationship:
but, alas, the inclusion of Age is not warranted in the GAMLSS model, nor (really) in the simpler LMER model.
Thus, while Age seems to negatively influence (in a sex-dependent manner) both SR and IR, as well as strengthen the negative relationship between them, its effects are far from clear in the current dataset.
Here we test the hypothesis that ID is confounded by Age and Sex structure between languages:
Data: info.rate.data.for.age
Models:
model.ID.age: ID ~ Age + (1 | Family)
model.ID.age.sex: ID ~ 1 + Sex * Age + (1 | Family)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
model.ID.age 4 1089.1 1111.5 -540.57 1081.1
model.ID.age.sex 6 1091.9 1125.5 -539.96 1079.9 1.2113 2 0.5457
Data: info.rate.data.for.age
Models:
model.ID: ID ~ (1 | Family)
model.ID.age.sex: ID ~ 1 + Sex * Age + (1 | Family)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
model.ID 3 1088.2 1105.0 -541.09 1082.2
model.ID.age.sex 6 1091.9 1125.5 -539.96 1079.9 2.2555 3 0.5211
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: ID ~ 1 + Sex * Age + (1 | Family)
Data: info.rate.data.for.age
REML criterion at convergence: 1113.3
Scaled residuals:
Min 1Q Median 3Q Max
-1.17017 -0.67570 -0.01297 0.05397 2.96861
Random effects:
Groups Name Variance Std.Dev.
Family (Intercept) 1.14186 1.0686
Residual 0.09778 0.3127
Number of obs: 1979, groups: Family, 9
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 5.992e+00 3.571e-01 8.071e+00 16.778 1.46e-07 ***
Sex1 -2.587e-02 2.361e-02 1.967e+03 -1.096 0.273
Age 9.073e-04 8.119e-04 1.967e+03 1.118 0.264
Sex1:Age 7.923e-04 7.370e-04 1.967e+03 1.075 0.282
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) Sex1 Age
Sex1 0.005
Age -0.068 -0.087
Sex1:Age -0.005 -0.955 0.095
Thus, there does not seem to be any relationship between ID, Age and Sex.
In our case, SDIR (versus Vietnamese) is reduced to the ratio between the number of syllables (NS) in Vietnamese to the NS in the language L for each Text separately; we denote this here also as NSVR (from “NS Vietnamese Ratio”).
Syntagmatic density of information ratio SDIR (relative to Vietnamese) versus ID with LOESS smoother (black) and linear regression (yellow) and their 95%CIs.
The “flat” correlations (Pearson and Spearman) between SDIR and ID are:
Pearson's product-moment correlation
data: d$NSVR and d$ID
t = 13.46, df = 253, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.5682215 0.7122938
sample estimates:
cor
0.6459739
Spearman's rank correlation rho
data: d$NSVR and d$ID
S = 1108400, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.5989002
The multi-level regression of SDIR on ID (with Text as random effect) and the Text’s ICC are:
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: NSVR ~ ID + (1 | Text)
Data: d
REML criterion at convergence: -394.5
Scaled residuals:
Min 1Q Median 3Q Max
-2.1594 -0.6786 -0.0734 0.5435 3.8668
Random effects:
Groups Name Variance Std.Dev.
Text (Intercept) 0.013819 0.11755
Residual 0.009879 0.09939
Number of obs: 255, groups: Text, 15
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) -0.124714 0.052989 98.407896 -2.354 0.0206 *
ID 0.146209 0.007138 239.000000 20.484 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
ID -0.811
Intraclass Correlation Coefficient for Linear mixed model
Family : gaussian (identity)
Formula: NSVR ~ ID + (1 | Text)
ICC (Text): 0.5831
If we consider just the average SDIR for each language:
Average syntagmatic density of information ratio SDIR (relative to Vietnamese) versus ID with LOESS smoother (black) and linear regression (yellow) and their 95%CIs.
Pearson's product-moment correlation
data: d1$NSVR.mean and d1$ID
t = 8.6199, df = 15, p-value = 3.394e-07
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.7683980 0.9682841
sample estimates:
cor
0.9121585
Spearman's rank correlation rho
data: d1$NSVR.mean and d1$ID
S = 162.6, p-value = 0.0001126
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.8007359
Call:
lm(formula = NSVR.mean ~ ID, data = d1)
Residuals:
Min 1Q Median 3Q Max
-0.07917 -0.04788 -0.01277 0.02717 0.13245
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.12471 0.10321 -1.208 0.246
ID 0.14621 0.01696 8.620 3.39e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.06098 on 15 degrees of freedom
Multiple R-squared: 0.832, Adjusted R-squared: 0.8208
F-statistic: 74.3 on 1 and 15 DF, p-value: 3.394e-07
In what follows, mixing probabilities are independent from factors such as Sex.
Between 1 and 5 Gaussian distributions:
1 component
Mixing Family: "NO"
Fitting method: EM algorithm
Call: gamlssMX(formula = SR ~ 1, family = NO, K = 1, data = d, plot = FALSE)
Mu Coefficients for model: 1
(Intercept)
6.631
Sigma Coefficients for model: 1
(Intercept)
0.1378
Estimated probabilities: 1
Degrees of Freedom for the fit: 2 Residual Deg. of Freedom 2286
Global Deviance: 7123.61
AIC: 7127.61
SBC: 7139.08
2 components
Mixing Family: c("NO", "NO")
Fitting method: EM algorithm
Call: gamlssMX(formula = SR ~ 1, family = NO, K = 2, data = d, plot = FALSE)
Mu Coefficients for model: 1
(Intercept)
5.26
Sigma Coefficients for model: 1
(Intercept)
-0.4778
Mu Coefficients for model: 2
(Intercept)
7.15
Sigma Coefficients for model: 2
(Intercept)
-0.1862
Estimated probabilities: 0.27457 0.72543
Degrees of Freedom for the fit: 5 Residual Deg. of Freedom 2283
Global Deviance: 7001.23
AIC: 7011.23
SBC: 7039.9
3 components
Mixing Family: c("NO", "NO", "NO")
Fitting method: EM algorithm
Call: gamlssMX(formula = SR ~ 1, family = NO, K = 3, data = d, plot = FALSE)
Mu Coefficients for model: 1
(Intercept)
7.257
Sigma Coefficients for model: 1
(Intercept)
-0.2162
Mu Coefficients for model: 2
(Intercept)
6.44
Sigma Coefficients for model: 2
(Intercept)
-0.1447
Mu Coefficients for model: 3
(Intercept)
5.199
Sigma Coefficients for model: 3
(Intercept)
-0.5124
Estimated probabilities: 0.590421 0.1746332 0.2349458
Degrees of Freedom for the fit: 8 Residual Deg. of Freedom 2280
Global Deviance: 7001.95
AIC: 7017.95
SBC: 7063.84
4 components
Mixing Family: c("NO", "NO", "NO", "NO")
Fitting method: EM algorithm
Call: gamlssMX(formula = SR ~ 1, family = NO, K = 4, data = d, plot = FALSE)
Mu Coefficients for model: 1
(Intercept)
7.012
Sigma Coefficients for model: 1
(Intercept)
-0.03502
Mu Coefficients for model: 2
(Intercept)
5.323
Sigma Coefficients for model: 2
(Intercept)
-0.4349
Mu Coefficients for model: 3
(Intercept)
7.257
Sigma Coefficients for model: 3
(Intercept)
-0.09049
Mu Coefficients for model: 4
(Intercept)
7.2
Sigma Coefficients for model: 4
(Intercept)
-0.6349
Estimated probabilities: 0.231015 0.2886475 0.2650284 0.2153091
Degrees of Freedom for the fit: 11 Residual Deg. of Freedom 2277
Global Deviance: 6996.59
AIC: 7018.59
SBC: 7081.68
5 components
Mixing Family: c("NO", "NO", "NO", "NO", "NO")
Fitting method: EM algorithm
Call: gamlssMX(formula = SR ~ 1, family = NO, K = 5, data = d, plot = FALSE)
Mu Coefficients for model: 1
(Intercept)
6.772
Sigma Coefficients for model: 1
(Intercept)
-0.1062
Mu Coefficients for model: 2
(Intercept)
7.611
Sigma Coefficients for model: 2
(Intercept)
-0.2516
Mu Coefficients for model: 3
(Intercept)
7.207
Sigma Coefficients for model: 3
(Intercept)
-1.041
Mu Coefficients for model: 4
(Intercept)
5.339
Sigma Coefficients for model: 4
(Intercept)
-0.4062
Mu Coefficients for model: 5
(Intercept)
6.852
Sigma Coefficients for model: 5
(Intercept)
-0.1055
Estimated probabilities: 0.1783584 0.2294359 0.1266083 0.2811023 0.1844951
Degrees of Freedom for the fit: 14 Residual Deg. of Freedom 2274
Global Deviance: 6988.48
AIC: 7016.48
SBC: 7096.78
Comparing AIC
df AIC
mix.SR.NO.2 5 7011.225
mix.SR.NO.5 14 7016.484
mix.SR.NO.3 8 7017.954
mix.SR.NO.4 11 7018.591
mix.SR.NO.1 2 7127.609
Showing the distributions
Mixture of Gaussians for SR.
Between 1 and 5 Gaussian distributions:
1 component
Mixing Family: "NO"
Fitting method: EM algorithm
Call: gamlssMX(formula = IR ~ 1, family = NO, K = 1, data = d, plot = FALSE)
Mu Coefficients for model: 1
(Intercept)
39.15
Sigma Coefficients for model: 1
(Intercept)
1.629
Estimated probabilities: 1
Degrees of Freedom for the fit: 2 Residual Deg. of Freedom 2286
Global Deviance: 13945.2
AIC: 13949.2
SBC: 13960.7
2 components
Mixing Family: c("NO", "NO")
Fitting method: EM algorithm
Call: gamlssMX(formula = IR ~ 1, family = NO, K = 2, data = d, plot = FALSE)
Mu Coefficients for model: 1
(Intercept)
41.16
Sigma Coefficients for model: 1
(Intercept)
1.861
Mu Coefficients for model: 2
(Intercept)
38.38
Sigma Coefficients for model: 2
(Intercept)
1.442
Estimated probabilities: 0.2770978 0.7229022
Degrees of Freedom for the fit: 5 Residual Deg. of Freedom 2283
Global Deviance: 13895
AIC: 13905
SBC: 13933.7
3 components
Mixing Family: c("NO", "NO", "NO")
Fitting method: EM algorithm
Call: gamlssMX(formula = IR ~ 1, family = NO, K = 3, data = d, plot = FALSE)
Mu Coefficients for model: 1
(Intercept)
40.01
Sigma Coefficients for model: 1
(Intercept)
0.9221
Mu Coefficients for model: 2
(Intercept)
42.61
Sigma Coefficients for model: 2
(Intercept)
1.713
Mu Coefficients for model: 3
(Intercept)
35.75
Sigma Coefficients for model: 3
(Intercept)
1.374
Estimated probabilities: 0.2997471 0.3102765 0.3899765
Degrees of Freedom for the fit: 8 Residual Deg. of Freedom 2280
Global Deviance: 13875.7
AIC: 13891.7
SBC: 13937.6
4 components
Mixing Family: c("NO", "NO", "NO", "NO")
Fitting method: EM algorithm
Call: gamlssMX(formula = IR ~ 1, family = NO, K = 4, data = d, plot = FALSE)
Mu Coefficients for model: 1
(Intercept)
39.59
Sigma Coefficients for model: 1
(Intercept)
0.7231
Mu Coefficients for model: 2
(Intercept)
42.84
Sigma Coefficients for model: 2
(Intercept)
1.772
Mu Coefficients for model: 3
(Intercept)
40.16
Sigma Coefficients for model: 3
(Intercept)
1.364
Mu Coefficients for model: 4
(Intercept)
34.43
Sigma Coefficients for model: 4
(Intercept)
1.254
Estimated probabilities: 0.2027138 0.2280114 0.3066644 0.2626103
Degrees of Freedom for the fit: 11 Residual Deg. of Freedom 2277
Global Deviance: 13870.5
AIC: 13892.5
SBC: 13955.6
5 components
Mixing Family: c("NO", "NO", "NO", "NO", "NO")
Fitting method: EM algorithm
Call: gamlssMX(formula = IR ~ 1, family = NO, K = 5, data = d, plot = FALSE)
Mu Coefficients for model: 1
(Intercept)
34.41
Sigma Coefficients for model: 1
(Intercept)
1.261
Mu Coefficients for model: 2
(Intercept)
43.85
Sigma Coefficients for model: 2
(Intercept)
1.77
Mu Coefficients for model: 3
(Intercept)
39.55
Sigma Coefficients for model: 3
(Intercept)
0.5825
Mu Coefficients for model: 4
(Intercept)
40.27
Sigma Coefficients for model: 4
(Intercept)
1.35
Mu Coefficients for model: 5
(Intercept)
40.07
Sigma Coefficients for model: 5
(Intercept)
1.376
Estimated probabilities: 0.2666359 0.1665143 0.1540443 0.2038578 0.2089478
Degrees of Freedom for the fit: 14 Residual Deg. of Freedom 2274
Global Deviance: 13868.7
AIC: 13896.7
SBC: 13977
Comparing AIC
df AIC
mix.IR.NO.3 8 13891.72
mix.IR.NO.4 11 13892.47
mix.IR.NO.5 14 13896.74
mix.IR.NO.2 5 13905.05
mix.IR.NO.1 2 13949.21
Showing the distributions
Mixture of Gaussians for IR.
We used three ways to estimate how unimodal a distribution is, as they tend to disagree and the problem of unimodality testing is far from settled (see Freeman & Dale, 2013):
diptest;For each such test, we performed four randomisation procedures to obtain an estimate of the “specialness” of the observed unimodality estimate; for each new permuted dataset, we recompute everything before estimating the unimodlaity of the permuted distribution:
The observed estimate (vertical blue solid line), the permuted distribution (gray histogram), and the “unimodality region” (shaded green rectangle) are shown below (for PM3, we also show the original estimate using the Speaker average SR as a vertical solid red line).
Permutation of the texts’ SRs (PM1).
Permutation of the languages’ ID (PM2).
Permutation of the speakers’ average SRs (PM3).
Permutation of the languages’ average SRs with speaker adjustement (PM4).
| Scenario | Measure | Test | Observed estimate (p-value) | % more unimodal permutations |
|---|---|---|---|---|
| PM1 | SR | Silverman | - (0.024) | 55.5% |
| PM1 | SR | Dip | 0.005 (0.984) * | 100% |
| PM1 | SR | BC | 0.19 () * | 100% |
| PM1 | IR | Silverman | - (0.835) * | 15.3% |
| PM1 | IR | Dip | 0.005 (0.992) * | 71% |
| PM1 | IR | BC | 0.167 () * | 100% |
| PM2 | SR | Silverman | - (0.024) | 56.4% |
| PM2 | SR | Dip | 0.005 (0.984) * | 100% |
| PM2 | SR | BC | 0.19 () * | 100% |
| PM2 | IR | Silverman | - (0.835) * | 2.7% |
| PM2 | IR | Dip | 0.005 (0.992) * | 25.9% |
| PM2 | IR | BC | 0.167 () * | 97.3% |
| PM3 | SR | Silverman | - (0.024) | 100% |
| PM3 | SR | Dip | 0.005 (0.984) * | 0% |
| PM3 | SR | BC | 0.19 () * | 100% |
| PM3 | IR | Silverman | - (0.835) * | 16.8% |
| PM3 | IR | Dip | 0.005 (0.992) * | 17.5% |
| PM3 | IR | BC | 0.167 () * | 100% |
| PM4 | SR | Silverman | - (0.024) | 4.1% |
| PM4 | SR | Dip | 0.005 (0.984) * | 1.5% |
| PM4 | SR | BC | 0.19 () * | 82.8% |
| PM4 | IR | Silverman | - (0.835) * | 0.7% |
| PM4 | IR | Dip | 0.005 (0.992) * | 13% |
| PM4 | IR | BC | 0.167 () * | 96.8% |
We compute various distances between languages (as implemented by function distance() in package philentropy) in what concerns the distribution of NS, SR and ID.
Comparing the distribution of pairwise distances between languages.
| m1 | m2 | d | mean1 | median1 | sd1 | mean2 | median2 | sd2 | p |
|---|---|---|---|---|---|---|---|---|---|
| IR | NS | Hellinger | 0.88 | 0.83 | 0.32 | 1.20 | 1.19 | 0.39 | 0.00 |
| IR | NS | Jensen-Shannon | 0.17 | 0.14 | 0.12 | 0.29 | 0.26 | 0.16 | 0.00 |
| IR | NS | Kolmogorov–Smirnov | 0.42 | 0.37 | 0.20 | 0.57 | 0.57 | 0.23 | 0.00 |
| IR | NS | Kullback-Leibler | 7.13 | 4.22 | 7.81 | 15.42 | 13.12 | 13.31 | 0.00 |
| IR | NS | Squared-Chi | 0.56 | 0.45 | 0.36 | 0.88 | 0.79 | 0.47 | 0.00 |
| IR | SR | Hellinger | 0.88 | 0.83 | 0.32 | 1.10 | 1.06 | 0.48 | 0.00 |
| IR | SR | Jensen-Shannon | 0.17 | 0.14 | 0.12 | 0.27 | 0.23 | 0.20 | 0.00 |
| IR | SR | Kolmogorov–Smirnov | 0.42 | 0.37 | 0.20 | 0.56 | 0.55 | 0.27 | 0.00 |
| IR | SR | Kullback-Leibler | 7.13 | 4.22 | 7.81 | 12.80 | 6.69 | 14.88 | 0.00 |
| IR | SR | Squared-Chi | 0.56 | 0.45 | 0.36 | 0.86 | 0.77 | 0.57 | 0.00 |
| NS | SR | Hellinger | 1.20 | 1.19 | 0.39 | 1.10 | 1.06 | 0.48 | 0.01 |
| NS | SR | Jensen-Shannon | 0.29 | 0.26 | 0.16 | 0.27 | 0.23 | 0.20 | 0.30 |
| NS | SR | Kolmogorov–Smirnov | 0.57 | 0.57 | 0.23 | 0.56 | 0.55 | 0.27 | 0.81 |
| NS | SR | Kullback-Leibler | 15.42 | 13.12 | 13.31 | 12.80 | 6.69 | 14.88 | 0.05 |
| NS | SR | Squared-Chi | 0.88 | 0.79 | 0.47 | 0.86 | 0.77 | 0.57 | 0.62 |
Campione, E., & Véronis, J. (1998). A multilingual prosodic database, Proc. of the 5th International Conference on Spoken Language Pro cessing (ICSLP’98), Sydney, Australia, 3163-3166.
Freeman, J. B., & Dale, R. (2013). Assessing bimodality to detect the presence of a dual cognitive process. Behavior research methods, 45(1), 83-97.
Hall, P., & York, M. (2001). On the calibration of Silverman’s test for multimodality. Statistica Sinica, 11, 515-536.
Hartigan, J. A., & Hartigan, P. M. (1985) The Dip Test of Unimodality. Annals of Statistics 13, 70–84.
Hartigan, P. M. (1985) Computation of the Dip Statistic to Test for Unimodality. Applied Statistics (JRSS C) 34, 320–325.
Le, V. B., Tran, D. D., Castelli, E., Besacier, L., & Serignat, J. F. (2004). Spoken and Written Language Resources for Vietnamese. In LREC. 4, pp. 599-602.
Lyding, V., Stemle, E., Borghetti, C., Brunello, M., Castagnoli, S., Dell’Orletta, F., Dittmann, H., Lenci, A., & Pirrelli, V. (2014). The PAISÀ Corpus of Italian Web Texts. In Proceedings of the 9th Web as Corpus Workshop (WaC-9). Association for Computational Linguistics, Gothenburg, Sweden, 36-43.
New B., Pallier C., Ferrand L., & Matos R. (2001). Une base de données lexicales du français contemporain sur internet: LEXIQUE 3.80, L’Année Psychologique, 101, 447-462. http://www.lexique.org.
Oh, Y. M. (2015). Linguistic complexity and information: quantitative approaches. PhD Thesis, Université de Lyon, France. Retrieved from http://www.afcp-parole.org/doc/theses/these_YMO15.pdf
Perea, M., Urkia, M., Davis, C. J., Agirre, A., Laseka, E., & Carreiras, M. (2006). E-Hitz: A word frequency list and a program for deriving psycholinguistic statistics in an agglutinative language (Basque). Behavior Research Methods, 38(4), 610-615.
Sharoff, S. (2006). Creating general-purpose corpora using automated search engine queries. In Baroni, M. and Bernardini, S. (Eds.) WaCky! Working papers on the web as corpus, Gedit, Bologna, http://corpus.leeds.ac.uk/queryzh.html.
Silverman, B.W. (1981). Using Kernel Density Estimates to investigate Multimodality. Journal of the Royal Statistical Society, Series B, 43, 97-99.
Váradi, T. (2002). The Hungarian National Corpus. In LREC.
Zséder, A., Recski, G., Varga, D., & Kornai, A. (2012). Rapid creation of large-scale corpora and frequency dictionaries. In Proceedings to LREC 2012.
R session infoThis document was compiled on:
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
locale: LC_CTYPE=en_US.UTF-8, LC_NUMERIC=C, LC_TIME=en_US.UTF-8, LC_COLLATE=en_US.UTF-8, LC_MONETARY=en_US.UTF-8, LC_MESSAGES=en_US.UTF-8, LC_PAPER=en_US.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.UTF-8 and LC_IDENTIFICATION=C
attached base packages: grid, parallel, splines, stats, graphics, grDevices, datasets, utils, methods and base
other attached packages: broman(v.0.69-5), philentropy(v.0.3.0), pander(v.0.6.3), moments(v.0.14), sjPlot(v.2.6.3), sjstats(v.0.17.4), gamlss.mx(v.4.3-5), nnet(v.7.3-12), gamlss(v.5.1-3), nlme(v.3.1-139), gamlss.dist(v.5.1-3), MASS(v.7.3-51.4), gamlss.data(v.5.1-3), lmerTest(v.3.1-0), lme4(v.1.1-21), Matrix(v.1.2-17), plyr(v.1.8.4), reshape2(v.1.4.3), ggrepel(v.0.8.0), ggplot2(v.3.1.1) and RhpcBLASctl(v.0.18-205)
loaded via a namespace (and not attached): tidyr(v.0.8.3), modelr(v.0.1.4), assertthat(v.0.2.1), highr(v.0.8), yaml(v.2.2.0), bayestestR(v.0.1.0), numDeriv(v.2016.8-1), pillar(v.1.3.1), backports(v.1.1.4), lattice(v.0.20-38), glue(v.1.3.1), digest(v.0.6.18), glmmTMB(v.0.2.3), minqa(v.1.2.4), colorspace(v.1.4-1), sandwich(v.2.5-1), htmltools(v.0.3.6), psych(v.1.8.12), pkgconfig(v.2.0.2), broom(v.0.5.2), haven(v.2.1.0), purrr(v.0.3.2), xtable(v.1.8-4), mvtnorm(v.1.0-8), scales(v.1.0.0), emmeans(v.1.3.4), tibble(v.2.1.1), generics(v.0.0.2), sjlabelled(v.1.0.17), TH.data(v.1.0-10), withr(v.2.1.2), TMB(v.1.7.15), lazyeval(v.0.2.2), mnormt(v.1.5-5), survival(v.2.44-1.1), magrittr(v.1.5), crayon(v.1.3.4), estimability(v.1.3), evaluate(v.0.13), foreign(v.0.8-71), forcats(v.0.4.0), tools(v.3.4.4), hms(v.0.4.2), multcomp(v.1.4-8), stringr(v.1.4.0), munsell(v.0.5.0), ggeffects(v.0.9.0), compiler(v.3.4.4), rlang(v.0.3.4), nloptr(v.1.2.1), labeling(v.0.3), rmarkdown(v.1.12), boot(v.1.3-22), gtable(v.0.3.0), codetools(v.0.2-16), sjmisc(v.2.7.9), R6(v.2.4.0), zoo(v.1.8-5), knitr(v.1.22), dplyr(v.0.8.0.1), performance(v.0.1.0), insight(v.0.2.0), stringi(v.1.4.3), Rcpp(v.1.0.1), tidyselect(v.0.2.5), xfun(v.0.6) and coda(v.0.19-2)
Here we compare the Speech Rate (SR) used in paper and defined as the canonical articulatory rate (= the number of syllables corresponding to the canonical text pronunciation per second of speech) with an estimate of the realized speech rate, based on the automatic detection of syllable nuclei implemented by the popular algorithm described in De Jong, Nivja H., and Ton Wempe. “Praat script to detect syllable nuclei and measure speech rate automatically.” Behavior research methods 41.2 (2009): 385-390. We used their algorithm with standard parameters (except for pauses which were retrieved from our manual annotation to match the main analysis) on the actual oral productions of our speakers, and the results are available in the TAB-separated CSV file AutomaticSylDetect.csv with the following structure:
Pearson’s correlation and paired t-test:
Pearson's product-moment correlation
data: syll.data$NS and syll.data$NS.auto
t = 57.58, df = 2286, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.7520760 0.7855572
sample estimates:
cor
0.7693444
Spearman's rank correlation rho
data: syll.data$NS and syll.data$NS.auto
S = 455450000, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.7718458
Paired t-test
data: syll.data$NS and syll.data$NS.auto
t = 59.48, df = 2287, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
19.08661 20.38804
sample estimates:
mean of the differences
19.73733
NS: canonical (x axis) vs automatic (y axis) overall (black) and separately by language (colored).
NS: canonical (x axis) vs automatic (y axis) overall (black) and separately by text (colored).
NS: canonical (x axis) vs automatic (y axis) separately by text and language.
Pearson’s correlation and paired t-test:
Pearson's product-moment correlation
data: syll.data$SR and syll.data$SR.auto
t = 27.135, df = 2286, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.4619458 0.5239624
sample estimates:
cor
0.4935813
Spearman's rank correlation rho
data: syll.data$SR and syll.data$SR.auto
S = 1059600000, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.4691952
Paired t-test
data: syll.data$SR and syll.data$SR.auto
t = 64.226, df = 2287, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
1.302039 1.384054
sample estimates:
mean of the differences
1.343047
SR: canonical (x axis) vs automatic (y axis) overall (black) and separately by language (colored).
SR: canonical (x axis) vs automatic (y axis) overall (black) and separately by text (colored).
SR: canonical (x axis) vs automatic (y axis) separately by text and language.
SR: canonical (x axis) vs automatic (y axis) separately by speaker.
Plot, linear (mixed-effects) regression, correlation and paired t-tests:
SR: canonical (x axis) vs automatic (y axis) separately by language with regression line (black) and LOESS smoothing (yellow) and their 95%CIs.
Across languages:
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: SR.auto ~ SR + (1 | Family/Language) + (1 | Text) + (1 | Speaker)
Data: syll.data
REML criterion at convergence: 1044.7
Scaled residuals:
Min 1Q Median 3Q Max
-3.4153 -0.6165 -0.0139 0.6006 3.4939
Random effects:
Groups Name Variance Std.Dev.
Speaker (Intercept) 8.152e-02 2.855e-01
Language:Family (Intercept) 4.036e-02 2.009e-01
Text (Intercept) 1.510e-03 3.886e-02
Family (Intercept) 1.542e-10 1.242e-05
Residual 7.367e-02 2.714e-01
Number of obs: 2288, groups: Speaker, 170; Language:Family, 17; Text, 15; Family, 9
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 3.657e+00 1.167e-01 2.102e+02 31.34 <2e-16 ***
SR 2.425e-01 1.556e-02 1.269e+03 15.58 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
SR -0.883
convergence code: 0
boundary (singular) fit: see ?isSingular
Intraclass Correlation Coefficient for Linear mixed model
Family : gaussian (identity)
Formula: SR.auto ~ SR + (1 | Family/Language) + (1 | Text) + (1 | Speaker)
ICC (Speaker): 0.4137
ICC (Language:Family): 0.2048
ICC (Text): 0.0077
ICC (Family): 0.0000
For each language separately:
For *CAT*
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
Data: d
REML criterion at convergence: 31.8
Scaled residuals:
Min 1Q Median 3Q Max
-2.0701 -0.6506 -0.0219 0.4436 3.1370
Random effects:
Groups Name Variance Std.Dev.
Text (Intercept) 0.01294 0.1137
Speaker (Intercept) 0.11321 0.3365
Residual 0.04915 0.2217
Number of obs: 150, groups: Text, 15; Speaker, 10
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 3.6072 0.4748 141.6269 7.598 3.74e-12 ***
SR 0.2751 0.0653 146.8828 4.213 4.38e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
SR -0.972
Intraclass Correlation Coefficient for Linear mixed model
Family : gaussian (identity)
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
ICC (Text): 0.0738
ICC (Speaker): 0.6458
For *CMN*
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
Data: d
REML criterion at convergence: 70.4
Scaled residuals:
Min 1Q Median 3Q Max
-2.83050 -0.69715 -0.01563 0.77666 2.44202
Random effects:
Groups Name Variance Std.Dev.
Text (Intercept) 0.009625 0.09811
Speaker (Intercept) 0.076741 0.27702
Residual 0.069251 0.26316
Number of obs: 150, groups: Text, 15; Speaker, 10
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 3.30672 0.45369 76.44565 7.289 2.43e-10 ***
SR 0.31115 0.07578 83.31162 4.106 9.36e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
SR -0.978
Intraclass Correlation Coefficient for Linear mixed model
Family : gaussian (identity)
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
ICC (Text): 0.0619
ICC (Speaker): 0.4931
For *DEU*
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
Data: d
REML criterion at convergence: 30.5
Scaled residuals:
Min 1Q Median 3Q Max
-3.1814 -0.5561 0.1037 0.4896 1.7387
Random effects:
Groups Name Variance Std.Dev.
Text (Intercept) 0.008901 0.09435
Speaker (Intercept) 0.049761 0.22307
Residual 0.056883 0.23850
Number of obs: 75, groups: Text, 15; Speaker, 10
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 3.27935 0.40582 19.85609 8.081 1.05e-07 ***
SR 0.25498 0.06536 20.50531 3.901 0.000854 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
SR -0.980
Intraclass Correlation Coefficient for Linear mixed model
Family : gaussian (identity)
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
ICC (Text): 0.0770
ICC (Speaker): 0.4307
For *ENG*
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
Data: d
REML criterion at convergence: 16
Scaled residuals:
Min 1Q Median 3Q Max
-1.58999 -0.59293 0.02286 0.47987 2.70002
Random effects:
Groups Name Variance Std.Dev.
Text (Intercept) 0.025424 0.1594
Speaker (Intercept) 0.001998 0.0447
Residual 0.051065 0.2260
Number of obs: 60, groups: Text, 15; Speaker, 10
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 2.8381 0.3514 8.2792 8.076 3.35e-05 ***
SR 0.3293 0.0547 8.4683 6.021 0.000252 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
SR -0.989
Intraclass Correlation Coefficient for Linear mixed model
Family : gaussian (identity)
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
ICC (Text): 0.3239
ICC (Speaker): 0.0255
For *EUS*
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
Data: d
REML criterion at convergence: 88.2
Scaled residuals:
Min 1Q Median 3Q Max
-2.5915 -0.6106 -0.0515 0.5333 2.1938
Random effects:
Groups Name Variance Std.Dev.
Text (Intercept) 0.02698 0.1643
Speaker (Intercept) 0.05815 0.2411
Residual 0.07462 0.2732
Number of obs: 150, groups: Text, 15; Speaker, 10
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 3.45341 0.47576 68.91222 7.259 4.54e-10 ***
SR 0.29879 0.06196 74.49606 4.823 7.31e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
SR -0.982
convergence code: 0
Model failed to converge with max|grad| = 0.00271745 (tol = 0.002, component 1)
Intraclass Correlation Coefficient for Linear mixed model
Family : gaussian (identity)
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
ICC (Text): 0.1689
ICC (Speaker): 0.3640
For *FIN*
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
Data: d
REML criterion at convergence: 94
Scaled residuals:
Min 1Q Median 3Q Max
-2.77451 -0.71674 -0.06606 0.55614 2.37964
Random effects:
Groups Name Variance Std.Dev.
Text (Intercept) 0.02740 0.1655
Speaker (Intercept) 0.08983 0.2997
Residual 0.07605 0.2758
Number of obs: 150, groups: Text, 15; Speaker, 10
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 4.12836 0.55589 103.30284 7.427 3.3e-11 ***
SR 0.16954 0.07608 110.66493 2.228 0.0279 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
SR -0.982
Intraclass Correlation Coefficient for Linear mixed model
Family : gaussian (identity)
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
ICC (Text): 0.1417
ICC (Speaker): 0.4648
For *FRA*
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
Data: d
REML criterion at convergence: 70.4
Scaled residuals:
Min 1Q Median 3Q Max
-3.00987 -0.62700 0.09629 0.62996 3.09995
Random effects:
Groups Name Variance Std.Dev.
Text (Intercept) 0.02005 0.1416
Speaker (Intercept) 0.05159 0.2271
Residual 0.06725 0.2593
Number of obs: 150, groups: Text, 15; Speaker, 10
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 3.84709 0.45106 83.05359 8.529 5.7e-13 ***
SR 0.25609 0.06447 89.52475 3.972 0.000144 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
SR -0.983
Intraclass Correlation Coefficient for Linear mixed model
Family : gaussian (identity)
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
ICC (Text): 0.1444
ICC (Speaker): 0.3715
For *HUN*
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
Data: d
REML criterion at convergence: -8.8
Scaled residuals:
Min 1Q Median 3Q Max
-3.2830 -0.5859 -0.0511 0.6111 2.8044
Random effects:
Groups Name Variance Std.Dev.
Text (Intercept) 0.001638 0.04047
Speaker (Intercept) 0.055970 0.23658
Residual 0.042307 0.20569
Number of obs: 150, groups: Text, 15; Speaker, 10
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 2.80958 0.38240 44.61013 7.347 3.28e-09 ***
SR 0.40933 0.06382 46.94630 6.414 6.36e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
SR -0.979
Intraclass Correlation Coefficient for Linear mixed model
Family : gaussian (identity)
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
ICC (Text): 0.0164
ICC (Speaker): 0.5602
For *ITA*
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
Data: d
REML criterion at convergence: -3.2
Scaled residuals:
Min 1Q Median 3Q Max
-1.84258 -0.54979 -0.02642 0.70407 1.58134
Random effects:
Groups Name Variance Std.Dev.
Text (Intercept) 0.00000 0.0000
Speaker (Intercept) 0.04278 0.2068
Residual 0.03439 0.1854
Number of obs: 54, groups: Text, 15; Speaker, 10
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 2.25739 0.38812 16.47267 5.816 2.34e-05 ***
SR 0.42568 0.05386 17.06472 7.903 4.20e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
SR -0.983
convergence code: 0
boundary (singular) fit: see ?isSingular
Intraclass Correlation Coefficient for Linear mixed model
Family : gaussian (identity)
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
ICC (Text): 0.0000
ICC (Speaker): 0.5543
For *JPN*
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
Data: d
REML criterion at convergence: 53.3
Scaled residuals:
Min 1Q Median 3Q Max
-2.52220 -0.60009 -0.04425 0.55847 2.53058
Random effects:
Groups Name Variance Std.Dev.
Text (Intercept) 0.04229 0.2056
Speaker (Intercept) 0.06345 0.2519
Residual 0.05458 0.2336
Number of obs: 150, groups: Text, 15; Speaker, 10
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 3.5877 0.6604 105.4468 5.432 3.6e-07 ***
SR 0.1836 0.0813 110.3256 2.258 0.0259 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
SR -0.989
Intraclass Correlation Coefficient for Linear mixed model
Family : gaussian (identity)
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
ICC (Text): 0.2638
ICC (Speaker): 0.3958
For *KOR*
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
Data: d
REML criterion at convergence: 36.8
Scaled residuals:
Min 1Q Median 3Q Max
-2.18322 -0.60061 -0.05576 0.67550 3.04396
Random effects:
Groups Name Variance Std.Dev.
Text (Intercept) 0.02043 0.1429
Speaker (Intercept) 0.07645 0.2765
Residual 0.05048 0.2247
Number of obs: 150, groups: Text, 15; Speaker, 10
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 2.91873 0.44492 78.41047 6.560 5.25e-09 ***
SR 0.34066 0.06101 88.83628 5.584 2.53e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
SR -0.976
Intraclass Correlation Coefficient for Linear mixed model
Family : gaussian (identity)
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
ICC (Text): 0.1387
ICC (Speaker): 0.5188
For *SPA*
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
Data: d
REML criterion at convergence: 63.8
Scaled residuals:
Min 1Q Median 3Q Max
-2.34549 -0.53648 -0.00435 0.60902 2.43317
Random effects:
Groups Name Variance Std.Dev.
Text (Intercept) 0.01698 0.1303
Speaker (Intercept) 0.18965 0.4355
Residual 0.05947 0.2439
Number of obs: 150, groups: Text, 15; Speaker, 10
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 4.28465 0.54626 121.79396 7.844 1.88e-12 ***
SR 0.19527 0.06823 118.13840 2.862 0.00498 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
SR -0.965
Intraclass Correlation Coefficient for Linear mixed model
Family : gaussian (identity)
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
ICC (Text): 0.0638
ICC (Speaker): 0.7127
For *SRP*
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
Data: d
REML criterion at convergence: 48.7
Scaled residuals:
Min 1Q Median 3Q Max
-2.8756 -0.6255 -0.1238 0.6560 2.1764
Random effects:
Groups Name Variance Std.Dev.
Text (Intercept) 0.03592 0.1895
Speaker (Intercept) 0.06958 0.2638
Residual 0.05301 0.2302
Number of obs: 150, groups: Text, 15; Speaker, 10
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 3.69692 0.49182 78.87158 7.517 7.63e-11 ***
SR 0.22437 0.06736 86.05809 3.331 0.00128 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
SR -0.980
Intraclass Correlation Coefficient for Linear mixed model
Family : gaussian (identity)
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
ICC (Text): 0.2266
ICC (Speaker): 0.4390
For *THA*
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
Data: d
REML criterion at convergence: 31.1
Scaled residuals:
Min 1Q Median 3Q Max
-2.65271 -0.62596 -0.00861 0.63282 2.20438
Random effects:
Groups Name Variance Std.Dev.
Text (Intercept) 0.00178 0.04219
Speaker (Intercept) 0.07571 0.27515
Residual 0.05556 0.23571
Number of obs: 150, groups: Text, 15; Speaker, 10
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 3.17088 0.35727 91.31262 8.875 5.61e-14 ***
SR 0.37101 0.07357 99.07802 5.043 2.07e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
SR -0.968
Intraclass Correlation Coefficient for Linear mixed model
Family : gaussian (identity)
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
ICC (Text): 0.0134
ICC (Speaker): 0.5690
For *TUR*
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
Data: d
REML criterion at convergence: 48.5
Scaled residuals:
Min 1Q Median 3Q Max
-2.12994 -0.62360 -0.02388 0.60274 2.19757
Random effects:
Groups Name Variance Std.Dev.
Text (Intercept) 0.02108 0.1452
Speaker (Intercept) 0.07852 0.2802
Residual 0.05500 0.2345
Number of obs: 149, groups: Text, 15; Speaker, 10
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 4.23116 0.41457 70.03009 10.206 1.69e-15 ***
SR 0.14471 0.05716 77.68545 2.532 0.0134 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
SR -0.972
Intraclass Correlation Coefficient for Linear mixed model
Family : gaussian (identity)
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
ICC (Text): 0.1364
ICC (Speaker): 0.5079
For *VIE*
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
Data: d
REML criterion at convergence: 47.6
Scaled residuals:
Min 1Q Median 3Q Max
-3.0095 -0.5369 -0.0324 0.5574 2.8577
Random effects:
Groups Name Variance Std.Dev.
Text (Intercept) 0.005367 0.07326
Speaker (Intercept) 0.100293 0.31669
Residual 0.059273 0.24346
Number of obs: 150, groups: Text, 15; Speaker, 10
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 2.39971 0.43084 71.69691 5.570 4.19e-07 ***
SR 0.50664 0.07884 84.71528 6.426 7.33e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
SR -0.971
Intraclass Correlation Coefficient for Linear mixed model
Family : gaussian (identity)
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
ICC (Text): 0.0325
ICC (Speaker): 0.6081
For *YUE*
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
Data: d
REML criterion at convergence: 3.5
Scaled residuals:
Min 1Q Median 3Q Max
-3.11616 -0.57419 -0.04198 0.62276 2.30008
Random effects:
Groups Name Variance Std.Dev.
Text (Intercept) 0.01674 0.1294
Speaker (Intercept) 0.03228 0.1797
Residual 0.04201 0.2050
Number of obs: 150, groups: Text, 15; Speaker, 10
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 2.3392 0.3532 144.9789 6.623 6.41e-10 ***
SR 0.4951 0.0622 147.6043 7.960 4.20e-13 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
SR -0.981
Intraclass Correlation Coefficient for Linear mixed model
Family : gaussian (identity)
Formula: SR.auto ~ SR + (1 | Text) + (1 | Speaker)
ICC (Text): 0.1839
ICC (Speaker): 0.3546
| Language | Pearson’s r | Spearman’s rho | Paired t-test | Intercept | Slope |
|---|---|---|---|---|---|
| all | r=0.49 (p=8.445e-141) | rho=0.47 (p=1.327e-125) | t(2287.0)=64.23 (p=0) | 3.66 (p=3.72e-81) | 0.24 (p=3.1e-50) |
| CAT | r=0.21 (p=0.009915) | rho=0.12 (p=0.1506) | t(149.0)=32.05 (p=9.71e-69) | 3.61 (p=3.74e-12) | 0.28 (p=4.38e-05) |
| CMN | r=0.61 (p=1.557e-16) | rho=0.58 (p=4.102e-15) | t(149.0)=17.88 (p=6.52e-39) | 3.31 (p=2.43e-10) | 0.31 (p=9.36e-05) |
| DEU | r=0.54 (p=6.05e-07) | rho=0.58 (p=3.847e-08) | t(74.0)=13.19 (p=4.09e-21) | 3.28 (p=1.05e-07) | 0.25 (p=0.000854) |
| ENG | r=0.60 (p=5.205e-07) | rho=0.60 (p=3.885e-07) | t(59.0)=20.08 (p=4.63e-28) | 2.84 (p=3.35e-05) | 0.33 (p=0.000252) |
| EUS | r=0.58 (p=5.564e-15) | rho=0.57 (p=1.539e-14) | t(149.0)=34.65 (p=3.43e-73) | 3.45 (p=4.54e-10) | 0.30 (p=7.31e-06) |
| FIN | r=0.16 (p=0.0509) | rho=0.12 (p=0.1481) | t(149.0)=32.16 (p=6.37e-69) | 4.13 (p=3.3e-11) | 0.17 (p=0.0279) |
| FRA | r=0.41 (p=2.346e-07) | rho=0.43 (p=3.183e-08) | t(149.0)=24.94 (p=4.79e-55) | 3.85 (p=5.7e-13) | 0.26 (p=0.000144) |
| HUN | r=0.73 (p=5.921e-26) | rho=0.76 (p=8.644e-30) | t(149.0)=16.80 (p=3.31e-36) | 2.81 (p=3.28e-09) | 0.41 (p=6.36e-08) |
| ITA | r=0.92 (p=2.878e-22) | rho=0.93 (p=0) | t(53.0)=24.40 (p=1.7e-30) | 2.26 (p=2.34e-05) | 0.43 (p=4.2e-07) |
| JPN | r=-0.07 (p=0.3829) | rho=-0.05 (p=0.5309) | t(149.0)=55.12 (p=5.28e-101) | 3.59 (p=3.6e-07) | 0.18 (p=0.0259) |
| KOR | r=0.43 (p=3.817e-08) | rho=0.48 (p=3.552e-10) | t(149.0)=30.44 (p=7.77e-66) | 2.92 (p=5.25e-09) | 0.34 (p=2.53e-07) |
| SPA | r=0.10 (p=0.2028) | rho=0.16 (p=0.05311) | t(149.0)=36.69 (p=1.66e-76) | 4.28 (p=1.88e-12) | 0.20 (p=0.00498) |
| SRP | r=0.51 (p=2.448e-11) | rho=0.47 (p=1.443e-09) | t(149.0)=36.54 (p=2.91e-76) | 3.70 (p=7.63e-11) | 0.22 (p=0.00128) |
| THA | r=0.67 (p=1.331e-20) | rho=0.64 (p=0) | t(149.0)=-6.94 (p=1.1e-10) | 3.17 (p=5.61e-14) | 0.37 (p=2.07e-06) |
| TUR | r=0.16 (p=0.0467) | rho=0.16 (p=0.05116) | t(148.0)=25.12 (p=3.01e-55) | 4.23 (p=1.69e-15) | 0.14 (p=0.0134) |
| VIE | r=0.67 (p=3.979e-21) | rho=0.61 (p=8.736e-17) | t(149.0)=5.10 (p=1.03e-06) | 2.40 (p=4.19e-07) | 0.51 (p=7.33e-09) |
| YUE | r=0.45 (p=5.825e-09) | rho=0.43 (p=4.186e-08) | t(149.0)=15.22 (p=3.85e-32) | 2.34 (p=6.41e-10) | 0.50 (p=4.2e-13) |
Here we generate the figures used in the main paper (saved to the ./figures folder as 600 DPI TIFF files Figure-*.tiff).